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ABSTRACT 

We have acquired the scientific knowledge of 
medicine based on observation and experience, but it 
has not always been so. Our ancestors experienced 
sickness and the fear of death before a rational picture 
could be made of them, and the medicine of that time 
was immersed in a system of beliefs, myths and rites. 
Presently, far too many diseases do not have proven 
preventions or treatments. There is a dire need to 
explore into the biological, environmental, and 
behavioral causal factors of these diseases. Precision 
medicine is a new science that takes into account the 
prevention and treatment of diseases by addressing 
specific changes genetic makeup of each person. 
Deviating from the conventional “one-size-fits-all” 
approach, genetic medicine encompasses treatment 
and prevention strategies for diseases applicable to the 
average person, and less concerned with differences 
between individuals. 

Chapter 1 - Introduction 

The amount of data being digitally collected and 
stored is vast and expanding rapidly. As a result, the 
science of data management and analysis is also 
advancing to enable organizations to convert this vast 
resource into information and knowledge that helps 
them achieve their objectives. Computer scientists 
have invented the term big data to describe this 
evolving technology. Big data has been successfully 
used in astronomy, retail sales, search, and politics. 
Here we will discuss about Health Care. 


In the last years it has been bom personalized 
genomics, which tells you your risk factors. This 
opens a door to personalized medicine, which adjusts 
treatments to patients depending on their genome. It 
uses information from a person’s genes and proteins 
to prevent, diagnose and treat a disease, all thanks to 
the sequencing of the human genome. 

The interest in the term genomic medicine has grown 
lately, partly because drugs are rarely 100% effective 
and safe and partly because of developments such as 
the Human Genome Project. These developments 
have made it possible to identify subtypes of various 
diseases on the basis of genetics in addition to other 
means such as histology, an ability that many believe 
will lead to an improved capacity to prevent and treat 
various diseases. For example, knowledge of genetics 
could help to determine whether patients with certain 
disease subtypes are more likely than others to be 
responsive to a particular drug. On the face of it, there 
seems to be agreement about what genetic medicine 
entails. 

Approximately 99.9 percent of the 3 billion base pairs 
in an individual’s genome are the same as in any other 
member of the human race. All the differences and 
features that make someone unique are encoded by 
only 0.1 percent of their DNA. But 0.1 percent of the 
genome corresponds to 3 million base pairs, so there 
is the potential for a lot of genetic differences. 
Included in these differences are some sixty brand 
new mutations changes in the sequence of bases in 
your DNA that have never existed in any person ever 
before. Everyone really is a mutant. All these genetic 
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differences determine not only differences in eye and 
hair color between someone and anybody else, but 
also whether anyone have a higher risk of lung cancer 
or a lower risk of Alzheimer’s disease or a greater 
chance of a heart attack. The ways that someone differ 
genetically from everybody else also allow the forces 
of natural selection to work. If an individual prosper 
and have lots of children that survive, their genetic 
code will be conserved and passed down to future 
generations. 



Figure 1 Individualized Medicine 


Genomic medicine attempts to build individualized 
strategies for diagnostic or therapeutic decision¬ 
making by utilizing patients’ genomic information. 
Big Data analytics uncovers hidden patterns, 
unknown correlations, and other insights through 
examining large-scale various data sets. While 
integration and manipulation of diverse genomic data 
and comprehensive electronic health records on a Big 
Data infrastructure exhibit challenges, they also 
provide a feasible opportunity to develop an efficient 
and effective approach to identify clinically actionable 
genetic variants for individualized diagnosis and 
therapy. 


medical devices that are unique to a patient. 
Rather, it involves the ability to classify 
individuals into subpopulations that are 
uniquely or disproportionately susceptible to a 
particular disease or responsive to a specific 
treatment. 

3. A form of medicine that uses information 
about a person’s genes, proteins, and 
environment to prevent, diagnose, and treat 
disease. 

Chapter 2 - The Existing System 

“One size fits all” is a promise that is rarely kept, 
whether it is applied to clothing or to medical 
treatments. Most medical therapies administered to 
large groups of patients only help a subset of the 
patients; frequently, we do not know why a particular 
treatment did not work in a given patient. Indeed, 
common diseases can have many different causes, and 
the effectiveness of a particular therapy may depend 
on the specific disease pathology in an individual 
patient. 

The practice of medicine has always been personal. 
Doctors use extensive personal information about a 
patient including medical history, physical exam, vital 
signs, family history, laboratory measures and 
imaging tests to determine a patients risk for certain 
diseases and to make diagnoses. 

There is a lot of money spent by pharmaceutical 
companies and others on advertising the benefits of 
modem medicine. Consequently, a lot of negative 
information on the subject does not reach the public 
domain. Although modem medicine has many 
advantages and successes, for example, in the 
treatment of trauma and emergencies, it also has 
disadvantages and failures. 


Further examination of the existing definitions of 
personalized medicine, however, reveals important 
disparities among them. For example, personalized 
medicine has been defined as 

1. A medical model that proposes the 
customization of healthcare, with decisions 
and practices being tailored to the individual 
patient by use of genetic or other information. 
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2. The tailoring of medical treatment to the 
specific characteristics of each patient. It does 
not literally mean the creation of drugs or 


Figure 2 Current One size fit all Approach vs 
Individualized Approach 
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Chapter 3 - Limitation of Existing System 

• Many drugs don’t work for or harm the 
patients they are prescribed for due to a “one 
size fits all” approach. 

• Need to determine who will benefit and who 
will not and avoid “trial and error” therapy. 

• Currently disease is often detected too late, 
leads to high costs and poor outcomes. 

• Need to detect and treat signs of disease before 
it becomes a problem, not after becoming ill. 


Drug Efficacy: Less Than 50% of Drugs Work 
on the Patient They are Prescribed For 


Drug 


Efficacy 


Anti-Depressants 62 % 


Asthma 60% 

Diabetes 57 % 

Arthritis 50 % 

Alzheimer 30 % 
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Figure 2 Drug Effectiveness 

There is a lot of money spent by pharmaceutical 
companies and others on advertising the benefits of 
modem medicine. Consequently, a lot of negative 
information on the subject does not reach the public 
domain. Although modem medicine has many 
advantages and successes, for example, in the 
treatment of trauma and emergencies, it also has 
disadvantages and failures. 

Modem medicine primarily uses surgery, radiation, 
and drugs to facilitate improvements in health and in 
the treatment of various illnesses. It is primarily 
involved in the treatment of the sick, unlike 
alternative therapies which also deal with the 
maintenance of health. In some cases, the therapy 
offered by conventional medicine is symptomatic 
management instead of addressing the cause of the 
illness. This can result in the progression of the 
disease as necessary lifestyle changes or corrective 
treatment is not initiated. 


These “adverse drug reactions” are common: every 
year more than 2 million North Americans are 
hospitalized because of adverse reactions to 


prescription dmgs. 2 The reason why dmgs work in 
some people and cause bad reactions in others can 
usually be traced back to differences in genetic 
makeup. Medicines that work for most people may 
not work for someone else. They may, indeed, harm 
that individual. So we need two things: first, we need 
ways of predicting and detecting disease well before it 
becomes life threatening; and second, we need 
medicines that work for someone and their unique 
body. 

We tend to view medical progress as some sort of 
continuum, along which we develop better drugs to 
fight whatever diseases are prevalent, better machines 
to image our insides and detect problems, better 
devices to use when joints wear out or eyes fail, better 
ways to treat pain or depression or loneliness; and we 
might be inclined to believe that the future holds more 
of the same. But it is not going to happen that way. 
Medical progress to this point has been mainly based 
on advances that benefit the population as a whole 
rather than you as an individual. Two hundred years 
ago, the average life span in England was only about 
forty years, largely because two-thirds of all children 
died before age four. There are statistics available 
suggesting that modem medicine is responsible for 
many deaths in America annually. 

Doctors do not seek to intentionally cause harm, but 
the results are terrible! Especially for chronic 
diseases like cancer and heart disease, the treatments 
simply don’t work. Unintended deaths are estimated 
at 783,000 per year, higher than all deaths in 2001 
caused by heart disease (699, 697), and cancer 
(553,251). 

Included in this number is 106,000 deaths per year 
from “adverse drug reactions”, or properly prescribed 
Dmgs. That’s more than 1 million deaths over a 10 
year period! When combining deaths due to 
prescription dmgs and “medical errors,” the number 
jumps up to 7.8 million deaths. 

The use of dmgs in modem medicine all have side- 
effects which can negatively affect one's health. There 
is a school of thought which advocates that modem 
medicine is controlled by the pharmaceutical 
companies. It suggests that doctors are encouraged to 
prescribe certain medications for the financial gain of 
these drug companies. 
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Chapter 4 - The Proposed System 

Genomic medicine is an emerging branch of medicine 
that involves an individual’s unique genetic makeup 
to customize medical care. The present form of 
genomic medicine is a direct result of the human 
genome sequencing project which started in 1990 
aiming to identify and map all of the human genes. 
Thirteen years and USD 3 billion later, the project had 
successfully sequenced the human genome. The year 
2007 marked another turning point with the 
application of next generation sequencing (NGS) 
technology to uncover the roles of rare individual 
genetic variances in common diseases. The cost of 
genome sequencing subsequently plummeted and 
currently stands at around USD 1000. These 
technological advances have led to a major leap 
forward in the scope of genomics and its growing role 
in the delivery of healthcare. 
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Figure 4 Using Hadoop(HDFS) to process genomic 
data 
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Figure 3 Cost to Sequence DNA per Mega byte 

Cloud computing providers offer services that provide 
the infrastructure, software, and programming 
platforms to clients, and are accountable for the cost 
for development and maintenance. Compared to 
creating and maintaining an in-house database, cloud 
computing is an economical approach to genomic data 
management because clients pay only for the services 
that they need. An example of an open-source 
framework used to develop infrastructure for 
processing genomic data in a cloud computing 
environment is Hadoop. It breaks the data into small 
fragments, distributes them across many data nodes, 
delivers the computational code to the nodes so that 
they are processed in parallel, and collectively 
assembles the results at the end. The parallel 
processing of many small pieces of data, known as 
MapReduce, greatly shortens the computing time. 


The Human Genome Project, completed in 2003, has 
provided scientists and clinicians with a diverse set of 
novel molecular tools that can be used to understand 
health and manage disease. 

The use of individual genetic information plays a key 
role in modem medicine and will fundamentally 
change the way we predict, prevent, diagnose and 
treat diseases in the near future, for the first time in 
the history of medicine, health care providers as well 
as patients can use predictive tools to develop a new 
model for health care based on health planning that is 
proactive and preventive, as opposed to the current 
model in health care that is reactive, episodic, and 
geared toward acute crisis intervention once disease is 
already manifest and largely irreversible. 

High-throughput genomics technology has made 
possible the era of precision medicine, an approach to 
healthcare that involves integrating a patient's genetic, 
lifestyle, and environmental data and then comparing 
these data to similar data collected for thousands of 
other individuals to predict illness and determine the 
best treatments. Precision medicine aims to tailor 
healthcare to patients by using clinically actionable 
genomic mutations to guide preventive interventions 
and clinical decision making. In the past 25 years, 
more than 4,000 Mendelian disorders have been 
studied at the genetic level. In addition, more than 80 
million genetic variants have been uncovered in the 
human genome. 

For people with a particular disease, using social 
media to compare their digital self with that of many 
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others suffering from the same disorder will have 
major impact. By correlating the severity of disease 
and the effectiveness of various therapies with 
genomic and other “omic” data, we can expect the 
development of improved, individualized treatments. 



World’s clinical 
information; Omic 
biomarkers 
associated with 
more than 300 
diseases 
(1-12 studies per 
disease) 


Disease- 

associated 

outliers 


“Big data" storage and analytics 

Figure 5 Uploading Genetic Information to Cloud for 

analysis 

Early versions of the digital version of you are 
starting to appear, although progress has been slow 
because of the enormous institutional, technical, and 
societal issues involved. The deeply conservative 
instincts of the medical profession have not helped 
either. The first manifestation of the digital you is 
your electronic medical record (EMR) sometimes 
known as your electronic health record (EHR). 

Some of the reasons for delay, it has to be admitted, 
are not the fault of the medical profession. Privacy is 
an enormous concern. Your EMR , because it is in 
electronic form, is susceptible to the same sort of 
hacking as any other personal data stored on your 
computer or by your credit card company or by your 
bank. Clearly you don’t want an insurer or employer 
to get hold of your medical record without your 
authorization. 


Which mutated genes are the driver mutations — the 
mutations that cause the cancer to grow? The cancer 
bioinformatics community is now tackling problems 
like this one with some success. However, the fact 
that a person’s cancer may have different mutations 
depending on which tumor in their body is tested or 
from where in any given tumor the sample is taken 
indicates the scale of taken from different biopsies 
from that patient would need to be done. 


If we can determine causative genes, we can then try 
to inhibit them. But personalized medicine is much 
more useful than that — a fact that will become clear 


as the digital you is compared with the digital 
versions of thousands of other people. 


So how does comparison of your data with data from 
other individuals help identify the causes of your 
disorders and result in a therapy meant just for you? 
One example concerns the relation between genotype 
(your genome) and phenotype (your personal traits). 
Comparison of genotype with phenotype over 
thousands or millions of individuals will reveal in 
exquisite detail how various genes contribute to every 
aspect of you, from the color of your hair to a 
tendency to lisp to athletic potential. If we analyze a 
subset of people taking a particular drug, we can start 
to correlate who’ll have symptoms such as dizziness, 
nausea, fatigue, or other nasty side effects according 
to their genetic makeup. The correlation between the 
genotype of people with a particular disease and their 
environment will start to bring to light the subtle 
relationship between environment and individual 
susceptibility to disease. 
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Figure 6 Proposed Diagnosis 

When the pooled data were analyzed, it was found 
that conventional treatments actually made the 
condition worse, whereas other therapies were 
surprisingly helpful. CureTogether now has patient- 
reported reviews of treatments for arthritis, Crohn’s 
disease, and bipolar disorder, among other conditions. 
CureTogether was recently acquired by 23andMe, a 
direct-to-consumer genomic sequencing and analysis 
company, which will gather genetic information on 
patients to see how this data relates to effectiveness of 
treatment and the toxicities associated with various 
drug therapies. 


Genetic testing will become a cornerstone of cancer 
diagnosis, allowing physicians to identify and classify 
tumors based on their genetic signatures in addition to 
their location in the body. Furthermore, results from 
genetic testing can be useful to evaluate the prognosis 
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of an individual’s cancer and to cross-reference the 
results to known treatment options for a patient’s 
particular mutations. 

Expectations are high that genetic testing will 
accurately predict the risk for various diseases and 
eventually lead to preventive and therapeutic 
interventions that are targeted to at-risk individuals 
based on their genetic profiles. 

Latest research has led to the development of Genetic 
Risk Scores (GRS) combining individual genetic 
variants associated with a specific disease. Such risk 
scores enable stratification of individuals into low- 
and high-risk groups for common disorders such as 
heart disease, diabetes and most cancers. 

In terms of insurance, GRS for breast cancer, for 
example, have been shown to have a better risk 
prediction than a score based on non-genetic risk 
factors (BMI, smoking status, alcohol, and family 
history of breast cancer) routinely assessed in 
insurance underwriting. Even better predictions can, 
however, be achieved using a combination of genetic 
and non-genetic factors. Along with traditional risk 
factors used in insurance underwriting, reliable GRS 
may become an additional or alternative technique for 
risk stratification of insurance applicants. 

Family-based analysis: Family-based NGS data 
enable the discovery of disease-contributing de novo 
mutations. Meanwhile, family-based research 
strategies can uncover many mutations that may be 
contributing to recessive, inherited as homozygous or 
compound heterozygous diseases. SeqHBase is a 
reliable and scalable computational program that 
manipulates genome-wide variants, functional 
annotations and every-site coverage, and analyzes 
whole genome sequencing data to identify disease- 
contributing genes effectively. It is a Big Data-based 
toolset designed to analyze large-scale family-based 
sequencing data to quickly discover de novo, inherited 
homozygous, and/or compound heterozygous 
mutations. 

Population-based analysis: A number of large-scale 
population-based sequencing studies are undergoing. 
For example, the PMI cohort program attempts to 
sequence one million or more American participants 
for improving our ability to preclude and cure 
diseases based on one’s differences in genetic make¬ 
up, lifestyle, and environmental factors. By 2025, 
over 100 million human genomes could be sequenced. 
Therefore, it is critical to develop statistical toolsets 


on a Big Data infrastructure for analyzing the 
genomic data of millions of people. 

Projects such as the Deciphering Developmental 
Disorders study, offering exome sequencing to 
children with severe developmental disorders, report 
that if a clinical exome was offered as a first line 
diagnostic test, 50% of these children would instantly 
receive a diagnosis. With advances in genomic 
technology, where relevant, it should be possible to 
identify the prime genetic cause for every rare 
disorder. What underpinned the success of the DDD 
project was the ability to match children at opposite 
ends of the world to each other, using a database 
called DECIPHER. As each child’s condition was 
unco mm on and for the doctors caring for that child, 
they may never have encountered a child with a 
similar condition before the DECIPHER database 
afforded the opportunity to link children with the 
same genetic result and phenotype. This added to the 
credibility that the identified variant was indeed the 
cause of the child’s condition. 
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Figure 7 DNA comparision between individuals 


So the digital version of you which is the sum of your 
genomic, proteomic, metabolomic, microbiomic, and 
potentially other omic data, combined with a digital 
record of your vital signs over time as detected by 
remote sensing — is on the way. This data will be 
informative, to say the least — particularly when you 
compare your digital self with the digital versions of 
others in your situation. The biggest single gap that 
remains is interpretation of this data. But as you can 
see from the progress being made with big data, it will 
become very precise, very soon. 
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